Noncentral F distribution (ncf)#

The noncentral F distribution is the distribution of many classical F-statistics under an alternative hypothesis. It generalizes the central F distribution by introducing a noncentrality parameter that captures “signal strength”.

Learning goals#

  • Understand the generative story: ratio of (noncentral) chi-square variables.

  • Relate ncf to hypothesis testing (ANOVA / regression) and power analysis.

  • Write down the PDF/CDF (as a Poisson mixture) and interpret parameters.

  • Derive mean/variance and compute higher moments (skewness/kurtosis) when they exist.

  • Implement a NumPy-only sampler and validate it with Monte Carlo.

  • Use scipy.stats.ncf for pdf, cdf, rvs, and fit.

Prerequisites#

  • Chi-square and F distributions; expectation/variance

  • Some comfort with special functions (Beta, Gamma) and log-likelihoods

Notebook roadmap#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations (Expectation, Variance, Likelihood)

  7. Sampling & Simulation (NumPy-only)

  8. Visualization (PDF, CDF, Monte Carlo)

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

from scipy import optimize, special, stats

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)

SEED = 7
rng = np.random.default_rng(SEED)
import sys
import scipy
import plotly

print("Python:", sys.version.split()[0])
print("NumPy:", np.__version__)
print("SciPy:", scipy.__version__)
print("Plotly:", plotly.__version__)
print("Seed:", SEED)
Python: 3.12.9
NumPy: 1.26.2
SciPy: 1.15.0
Plotly: 6.5.2
Seed: 7

1) Title & Classification#

  • Name: ncf (Noncentral F distribution; SciPy: scipy.stats.ncf)

  • Type: Continuous

  • Support (standard form): \(x \in (0,\infty)\)

  • Parameter space (standard form):

    • numerator degrees of freedom \(\nu_1 > 0\) (SciPy: dfn)

    • denominator degrees of freedom \(\nu_2 > 0\) (SciPy: dfd)

    • noncentrality \(\lambda \ge 0\) (SciPy: nc)

  • SciPy location/scale: loc \in \mathbb{R}, scale > 0 with $\(X = \text{loc} + \text{scale}\,Y,\qquad Y \sim \mathrm{NCF}(\nu_1,\nu_2,\lambda).\)$

Unless stated otherwise, we work with the standard form (loc=0, scale=1).

Notation you may see:

  • \(X \sim F_{\text{nc}}(\nu_1,\nu_2,\lambda)\) (noncentral F)

  • sometimes \(\nu_1\) / \(\nu_2\) are written as df1 / df2

2) Intuition & Motivation#

What it models#

The central F distribution arises as a ratio of two independent variance estimates. The noncentral F appears when the numerator contains a signal term (e.g., the model explains real variation).

A canonical setting is the classical linear model with Gaussian noise:

  • Under \(H_0\) (no signal), an F-statistic has a central F distribution.

  • Under \(H_1\) (signal present), the same statistic has a noncentral F distribution with noncentrality \(\lambda\).

Typical real-world use cases#

  • Power analysis for ANOVA and regression F-tests

  • Design of experiments: sample size vs detectable effect size

  • Model comparison (nested models): distribution of the test statistic under alternatives

Relations to other distributions#

  • Central F: \(\lambda=0\) recovers \(F(\nu_1,\nu_2)\).

  • Noncentral chi-square: the noncentral F is a ratio involving a noncentral chi-square random variable.

  • Noncentral t: if \(T \sim \mathrm{nct}(\nu, \delta)\) then \(T^2 \sim \mathrm{ncf}(1,\nu,\delta^2)\).

  • Poisson mixture: \(\mathrm{ncf}\) can be written as a Poisson mixture of central F distributions (very useful computationally).

3) Formal Definition#

Generative definition#

Let

  • \(U \sim \chi'^2_{\nu_1}(\lambda)\) be a noncentral chi-square random variable with \(\nu_1\) degrees of freedom and noncentrality \(\lambda\).

  • \(V \sim \chi^2_{\nu_2}\) be an independent central chi-square random variable.

Define $\( X \,=\, \frac{U/\nu_1}{V/\nu_2} \,=\, \frac{\nu_2}{\nu_1}\,\frac{U}{V}. \)\( Then \)X \sim \mathrm{NCF}(\nu_1,\nu_2,\lambda)$.


Central F building block#

For a central F random variable \(Y \sim F(d_1,d_2)\), \(y>0\):

PDF $\( f_F(y; d_1,d_2) = \frac{1}{B\left(\tfrac{d_1}{2},\tfrac{d_2}{2}\right)}\left(\frac{d_1}{d_2}\right)^{d_1/2} \frac{y^{d_1/2-1}}{\left(1+\tfrac{d_1}{d_2}y\right)^{(d_1+d_2)/2}}. \)$

CDF $\( F_F(y; d_1,d_2) = I_{z}(\tfrac{d_1}{2},\tfrac{d_2}{2}),\qquad z = \frac{d_1 y}{d_1 y + d_2}, \)\( where \)I_z(a,b)$ is the regularized incomplete beta function.


Poisson-mixture representation (PDF and CDF)#

A key identity is that a noncentral chi-square is a Poisson mixture of central chi-squares:

\[ U\mid K=k \;\sim\; \chi^2_{\nu_1 + 2k}, \qquad K \sim \mathrm{Poisson}(\lambda/2). \]

Let $\(w_k = \mathbb{P}(K=k) = e^{-\lambda/2}\,\frac{(\lambda/2)^k}{k!}.\)$

Then \(X\) is a Poisson mixture of central F distributions:

PDF $\( f_X(x;\nu_1,\nu_2,\lambda) = \sum_{k=0}^{\infty} w_k\, f_F\bigl(x; \nu_1+2k,\nu_2\bigr),\qquad x>0. \)$

CDF $\( F_X(x;\nu_1,\nu_2,\lambda) = \sum_{k=0}^{\infty} w_k\, I_{z}\bigl(\tfrac{\nu_1}{2}+k,\tfrac{\nu_2}{2}\bigr), \qquad z = \frac{\nu_1 x}{\nu_1 x + \nu_2}. \)$

These series are the most common “closed forms” used in practice.

def validate_ncf_params(dfn: float, dfd: float, nc: float):
    dfn = float(dfn)
    dfd = float(dfd)
    nc = float(nc)
    if not (dfn > 0):
        raise ValueError(f"dfn must be > 0, got {dfn!r}")
    if not (dfd > 0):
        raise ValueError(f"dfd must be > 0, got {dfd!r}")
    if not (nc >= 0):
        raise ValueError(f"nc must be >= 0, got {nc!r}")
    return dfn, dfd, nc


def f_logpdf(x, dfn: float, dfd: float):
    """Log-PDF of the central F(dfn, dfd) distribution (standard form)."""
    dfn = float(dfn)
    dfd = float(dfd)
    x = np.asarray(x, dtype=float)

    out = np.full_like(x, -np.inf, dtype=float)
    mask = x > 0
    xm = x[mask]

    a = 0.5 * dfn
    b = 0.5 * dfd

    logc = a * np.log(dfn / dfd) - special.betaln(a, b)
    out[mask] = logc + (a - 1.0) * np.log(xm) - (a + b) * np.log1p((dfn / dfd) * xm)
    return out


def f_pdf(x, dfn: float, dfd: float):
    return np.exp(f_logpdf(x, dfn, dfd))


def f_cdf(x, dfn: float, dfd: float):
    """CDF of the central F(dfn, dfd) distribution (standard form)."""
    dfn = float(dfn)
    dfd = float(dfd)
    x = np.asarray(x, dtype=float)

    out = np.zeros_like(x, dtype=float)
    mask = x > 0
    xm = x[mask]
    z = (dfn * xm) / (dfn * xm + dfd)
    out[mask] = special.betainc(0.5 * dfn, 0.5 * dfd, z)
    return out


# Quick sanity check vs SciPy's central F
x_demo = np.linspace(0.1, 3.0, 5)
print("max |pdf - scipy|:", np.max(np.abs(f_pdf(x_demo, 5, 20) - stats.f.pdf(x_demo, 5, 20))))
print("max |cdf - scipy|:", np.max(np.abs(f_cdf(x_demo, 5, 20) - stats.f.cdf(x_demo, 5, 20))))
max |pdf - scipy|: 2.275957200481571e-15
max |cdf - scipy|: 3.3306690738754696e-16
def poisson_weights_trunc(lam: float, tol: float = 1e-12, k_max: int = 10_000):
    """Truncate Poisson(lam) weights so that remaining tail mass is < tol."""
    lam = float(lam)
    if lam < 0:
        raise ValueError("lam must be >= 0")

    if lam == 0.0:
        return np.array([1.0]), 1.0

    weights = np.empty(k_max + 1, dtype=float)
    weights[0] = np.exp(-lam)
    mass = weights[0]

    k = 0
    # w_{k+1} = w_k * lam/(k+1)
    while k < k_max and (1.0 - mass) > tol:
        k += 1
        weights[k] = weights[k - 1] * lam / k
        mass += weights[k]

    return weights[: k + 1], mass


def ncf_pdf_series(x, dfn: float, dfd: float, nc: float, tol: float = 1e-12, k_max: int = 10_000):
    """PDF of ncf(dfn, dfd, nc) via Poisson-mixture series truncation."""
    dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
    x = np.asarray(x, dtype=float)

    weights, mass = poisson_weights_trunc(nc / 2.0, tol=tol, k_max=k_max)

    out = np.zeros_like(x, dtype=float)
    for k, wk in enumerate(weights):
        out += wk * f_pdf(x, dfn + 2 * k, dfd)

    # Optional: renormalize because we truncated the Poisson tail.
    return out / mass


def ncf_cdf_series(x, dfn: float, dfd: float, nc: float, tol: float = 1e-12, k_max: int = 10_000):
    """CDF of ncf(dfn, dfd, nc) via Poisson-mixture series truncation."""
    dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
    x = np.asarray(x, dtype=float)

    weights, mass = poisson_weights_trunc(nc / 2.0, tol=tol, k_max=k_max)

    out = np.zeros_like(x, dtype=float)
    for k, wk in enumerate(weights):
        out += wk * f_cdf(x, dfn + 2 * k, dfd)

    return out / mass


# Compare series approximation to SciPy (moderate parameters)
dfn, dfd, nc = 5, 20, 10
x_grid = np.linspace(0.001, 6.0, 400)

pdf_scipy = stats.ncf.pdf(x_grid, dfn, dfd, nc)
cdf_scipy = stats.ncf.cdf(x_grid, dfn, dfd, nc)

pdf_series = ncf_pdf_series(x_grid, dfn, dfd, nc, tol=1e-12)
cdf_series = ncf_cdf_series(x_grid, dfn, dfd, nc, tol=1e-12)

print("max |pdf_series - pdf_scipy|:", float(np.max(np.abs(pdf_series - pdf_scipy))))
print("max |cdf_series - cdf_scipy|:", float(np.max(np.abs(cdf_series - cdf_scipy))))
max |pdf_series - pdf_scipy|: 0.8056810852034035
max |cdf_series - cdf_scipy|: 0.6692626621651225

4) Moments & Properties#

Existence of moments#

From the ratio representation \(X = (U/\nu_1)/(V/\nu_2)\), negative moments of \(V\sim\chi^2_{\nu_2}\) appear. As a result:

  • The \(r\)-th raw moment \(\mathbb{E}[X^r]\) exists only if \(\nu_2 > 2r\).

    • mean exists if \(\nu_2 > 2\)

    • variance exists if \(\nu_2 > 4\)

    • skewness needs \(\nu_2 > 6\)

    • kurtosis needs \(\nu_2 > 8\)

Mean and variance (closed form)#

Let \(X \sim \mathrm{NCF}(\nu_1,\nu_2,\lambda)\).

Mean (for \(\nu_2>2\)): $\( \mathbb{E}[X] = \frac{\nu_2}{\nu_2-2}\left(1+\frac{\lambda}{\nu_1}\right) = \frac{\nu_2(\nu_1+\lambda)}{\nu_1(\nu_2-2)}. \)$

Variance (for \(\nu_2>4\)): $\( \mathrm{Var}(X) = \frac{2\,(\nu_2/\nu_1)^2\left[(\nu_2-2)(\nu_1+2\lambda) + (\nu_1+\lambda)^2\right]}{(\nu_2-2)^2(\nu_2-4)}. \)$

These reduce to the familiar central-F formulas when \(\lambda=0\).

Skewness and kurtosis#

Closed forms exist but get algebraically bulky. A practical approach is:

  1. compute raw moments \(\mathbb{E}[X],\dots,\mathbb{E}[X^4]\) (when they exist),

  2. convert to central moments, then to skewness/kurtosis.

MGF / characteristic function#

  • The MGF \(M_X(t)=\mathbb{E}[e^{tX}]\) does not exist for any \(t>0\) (polynomial right tail, like the central F).

  • The Laplace transform \(\mathbb{E}[e^{tX}]\) exists for \(t<0\) and can be computed numerically.

  • The characteristic function \(\varphi_X(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\).

Entropy#

There is no simple elementary closed form in general. In practice, use numerical methods such as scipy.stats.ncf.entropy (which evaluates it numerically).

import math


def ncx2_raw_moments(dof: float, nc: float, max_order: int = 4):
    """Raw moments E[U^r] for U ~ noncentral chi-square(dof, nc), for r=1..max_order.

    Uses cumulants from log-MGF and converts cumulants -> raw moments.
    """
    dof = float(dof)
    nc = float(nc)

    if max_order > 4:
        raise ValueError("This helper is implemented up to order 4.")

    # cumulant κ_r = 2^{r-1} (r-1)! (dof + r*nc)
    kappa = [None]
    for r in range(1, max_order + 1):
        kappa_r = (2 ** (r - 1)) * math.factorial(r - 1) * (dof + r * nc)
        kappa.append(kappa_r)

    k1 = kappa[1]
    if max_order == 1:
        return np.array([k1])

    k2 = kappa[2]
    m1 = k1
    m2 = k2 + k1**2
    if max_order == 2:
        return np.array([m1, m2])

    k3 = kappa[3]
    m3 = k3 + 3 * k2 * k1 + k1**3
    if max_order == 3:
        return np.array([m1, m2, m3])

    k4 = kappa[4]
    m4 = k4 + 4 * k3 * k1 + 3 * (k2**2) + 6 * k2 * (k1**2) + k1**4
    return np.array([m1, m2, m3, m4])


def chisquare_neg_moment(dof: float, r: int):
    """E[V^{-r}] for V ~ chi-square(dof). Requires dof > 2r."""
    dof = float(dof)
    r = int(r)
    if dof <= 2 * r:
        return np.nan

    k = 0.5 * dof
    theta = 2.0
    # For Gamma(k, theta): E[V^{-r}] = theta^{-r} * Γ(k-r)/Γ(k)
    return (theta ** (-r)) * special.gamma(k - r) / special.gamma(k)


def ncf_raw_moments(dfn: float, dfd: float, nc: float):
    """Raw moments E[X^r] for X ~ ncf(dfn, dfd, nc), r=1..4 when they exist."""
    dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)

    mU = ncx2_raw_moments(dfn, nc, max_order=4)

    out = []
    for r in range(1, 5):
        ev_inv = chisquare_neg_moment(dfd, r)
        if not np.isfinite(ev_inv):
            out.append(np.nan)
            continue
        out.append(((dfd / dfn) ** r) * mU[r - 1] * ev_inv)

    return np.array(out, dtype=float)


def raw_to_central(m1, m2, m3, m4):
    """Convert raw moments to central moments (μ2, μ3, μ4)."""
    mu = m1
    mu2 = m2 - mu**2
    mu3 = m3 - 3 * mu * m2 + 2 * mu**3
    mu4 = m4 - 4 * mu * m3 + 6 * mu**2 * m2 - 3 * mu**4
    return mu2, mu3, mu4


def ncf_mean_closed(dfn: float, dfd: float, nc: float):
    dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
    if dfd <= 2:
        return np.nan
    return dfd * (dfn + nc) / (dfn * (dfd - 2))


def ncf_var_closed(dfn: float, dfd: float, nc: float):
    dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
    if dfd <= 4:
        return np.nan
    num = 2 * (dfd / dfn) ** 2 * ((dfd - 2) * (dfn + 2 * nc) + (dfn + nc) ** 2)
    den = (dfd - 2) ** 2 * (dfd - 4)
    return num / den


dfn, dfd, nc = 5, 20, 10
m1, m2, m3, m4 = ncf_raw_moments(dfn, dfd, nc)
mu2, mu3, mu4 = raw_to_central(m1, m2, m3, m4)

skew = mu3 / (mu2 ** 1.5)
excess_kurt = mu4 / (mu2**2) - 3.0

print("Raw moments E[X^r], r=1..4:", np.array([m1, m2, m3, m4]))
print("Mean (closed):", ncf_mean_closed(dfn, dfd, nc))
print("Var  (closed):", ncf_var_closed(dfn, dfd, nc))
print("Skewness:", skew)
print("Excess kurtosis:", excess_kurt)

mvsk = stats.ncf.stats(dfn, dfd, nc, moments="mvsk")
print("SciPy mvsk:", mvsk)
Raw moments E[X^r], r=1..4: [  3.3333  15.2778  93.7302 765.    ]
Mean (closed): 3.3333333333333335
Var  (closed): 4.166666666666667
Skewness: 1.766743077969323
Excess kurtosis: 6.412571428571438
SciPy mvsk: (3.3333333333333335, 4.166666666666667, 1.7667430779693272, 6.412571428571429)

5) Parameter Interpretation#

Degrees of freedom \(\nu_1\) (numerator)#

  • Comes from the dimension of the signal/constraint being tested.

  • Larger \(\nu_1\) generally makes the distribution less “spiky” near 0 and less variable.

Degrees of freedom \(\nu_2\) (denominator)#

  • Often corresponds to residual degrees of freedom.

  • Controls tail heaviness and moment existence: small \(\nu_2\) implies very heavy tails.

Noncentrality \(\lambda\)#

  • Encodes how far from the null you are.

  • In many tests, \(\lambda\) is proportional to sample size × effect size².

  • Increasing \(\lambda\) shifts mass to the right (larger typical F-statistics) and increases power.

Below we visualize how the PDF changes as we vary parameters.

def plot_pdf_family(dfn: float, dfd: float, ncs, x_max: float = 6.0):
    x = np.linspace(0.001, x_max, 600)
    fig = go.Figure()
    for nc in ncs:
        fig.add_trace(
            go.Scatter(x=x, y=stats.ncf.pdf(x, dfn, dfd, nc), mode="lines", name=f"nc={nc}")
        )
    fig.update_layout(
        title=f"ncf PDF family (dfn={dfn}, dfd={dfd})",
        xaxis_title="x",
        yaxis_title="pdf(x)",
        width=900,
        height=450,
    )
    fig.show()


plot_pdf_family(dfn=5, dfd=20, ncs=[0, 2, 5, 10, 20], x_max=8.0)

6) Derivations#

6.1 Expectation#

Using \(X = \frac{\nu_2}{\nu_1}\frac{U}{V}\) with independent \(U\) and \(V\):

\[ \mathbb{E}[X] = \frac{\nu_2}{\nu_1}\,\mathbb{E}[U] \,\mathbb{E}[V^{-1}]. \]

For \(U\sim\chi'^2_{\nu_1}(\lambda)\): $\(\mathbb{E}[U] = \nu_1 + \lambda.\)$

For \(V\sim\chi^2_{\nu_2}\) (a Gamma distribution), for \(\nu_2>2\): $\(\mathbb{E}[V^{-1}] = \frac{1}{\nu_2-2}.\)$

Putting it together: $\( \mathbb{E}[X] = \frac{\nu_2(\nu_1+\lambda)}{\nu_1(\nu_2-2)}. \)$

6.2 Variance#

Similarly, $\( \mathbb{E}[X^2] = \left(\frac{\nu_2}{\nu_1}\right)^2 \mathbb{E}[U^2] \, \mathbb{E}[V^{-2}]. \)$

For \(\nu_2>4\), $\(\mathbb{E}[V^{-2}] = \frac{1}{(\nu_2-2)(\nu_2-4)}.\)$

For \(U\sim\chi'^2_{\nu_1}(\lambda)\), $\( \mathrm{Var}(U)=2(\nu_1+2\lambda) \quad\Rightarrow\quad \mathbb{E}[U^2] = \mathrm{Var}(U) + \mathbb{E}[U]^2. \)$

Then \(\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2\) yields the closed form from Section 4.

6.3 Likelihood#

Given data \(x_1,\dots,x_n\) assumed i.i.d. from \(\mathrm{NCF}(\nu_1,\nu_2,\lambda)\), the likelihood is

\[ L(\nu_1,\nu_2,\lambda) = \prod_{i=1}^n f_X(x_i;\nu_1,\nu_2,\lambda), \qquad \ell = \sum_{i=1}^n \log f_X(x_i;\nu_1,\nu_2,\lambda). \]

Because \(f_X\) is a series (or special-function expression), MLE is typically done numerically with constraints \(\nu_1>0,\nu_2>0,\lambda\ge 0\).

# Likelihood demo: estimate nc with dfn/dfd fixed

dfn_true, dfd_true, nc_true = 5, 20, 10
n = 2_000
x = stats.ncf.rvs(dfn_true, dfd_true, nc_true, size=n, random_state=rng)


def neg_loglik_nc(nc: float) -> float:
    if nc < 0:
        return np.inf
    return float(-np.sum(stats.ncf.logpdf(x, dfn_true, dfd_true, nc)))


res = optimize.minimize_scalar(neg_loglik_nc, bounds=(0.0, 40.0), method="bounded")
print("True nc:", nc_true)
print("Estimated nc (MLE, dfn/dfd fixed):", float(res.x))
print("Optimization success:", res.success)
True nc: 10
Estimated nc (MLE, dfn/dfd fixed): 9.982072333858543
Optimization success: True

7) Sampling & Simulation (NumPy-only)#

A clean NumPy-only sampler comes directly from the Poisson-mixture construction.

  1. Draw \(K \sim \mathrm{Poisson}(\lambda/2)\).

  2. Draw \(U \sim \chi^2_{\nu_1+2K}\) (central chi-square).

  3. Draw \(V \sim \chi^2_{\nu_2}\).

  4. Return \(X = (U/\nu_1)/(V/\nu_2)\).

This works because a noncentral chi-square is a Poisson mixture of central chi-squares.

def ncf_rvs_numpy(rng: np.random.Generator, dfn: float, dfd: float, nc: float, size):
    """Sample from ncf(dfn, dfd, nc) using NumPy only (Poisson-mixture sampler)."""
    dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)

    lam = 0.5 * nc
    k = rng.poisson(lam, size=size)

    # If df is an array, Generator.chisquare broadcasts and returns the same shape.
    u = rng.chisquare(dfn + 2.0 * k)
    v = rng.chisquare(dfd, size=size)

    return (u / dfn) / (v / dfd)


dfn, dfd, nc = 5, 20, 10
samples = ncf_rvs_numpy(rng, dfn, dfd, nc, size=200_000)

print("Sample mean:", float(np.mean(samples)))
print("Theory mean:", ncf_mean_closed(dfn, dfd, nc))
print("Sample var:", float(np.var(samples)))
print("Theory var:", ncf_var_closed(dfn, dfd, nc))
Sample mean: 3.329072332845629
Theory mean: 3.3333333333333335
Sample var: 4.114308408422413
Theory var: 4.166666666666667

8) Visualization#

We’ll visualize:

  • the PDF and CDF (SciPy vs the truncated-series implementation)

  • Monte Carlo samples (histogram and empirical CDF)

dfn, dfd, nc = 5, 20, 10

# Monte Carlo samples
n = 50_000
s = ncf_rvs_numpy(rng, dfn, dfd, nc, size=n)

# Grids
x = np.linspace(0.001, 8.0, 700)

pdf_scipy = stats.ncf.pdf(x, dfn, dfd, nc)
cdf_scipy = stats.ncf.cdf(x, dfn, dfd, nc)

pdf_series = ncf_pdf_series(x, dfn, dfd, nc)
cdf_series = ncf_cdf_series(x, dfn, dfd, nc)

# Empirical CDF
s_sorted = np.sort(s)
ecdf_y = (np.arange(1, n + 1) / n).astype(float)

fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))

# PDF panel
fig.add_trace(
    go.Scatter(x=x, y=pdf_scipy, mode="lines", name="SciPy pdf"), row=1, col=1
)
fig.add_trace(
    go.Scatter(x=x, y=pdf_series, mode="lines", name="Series pdf", line=dict(dash="dash")),
    row=1,
    col=1,
)
fig.add_trace(
    go.Histogram(
        x=s,
        nbinsx=80,
        histnorm="probability density",
        name="MC hist",
        opacity=0.35,
    ),
    row=1,
    col=1,
)

# CDF panel
fig.add_trace(
    go.Scatter(x=x, y=cdf_scipy, mode="lines", name="SciPy cdf"), row=1, col=2
)
fig.add_trace(
    go.Scatter(x=x, y=cdf_series, mode="lines", name="Series cdf", line=dict(dash="dash")),
    row=1,
    col=2,
)
fig.add_trace(
    go.Scatter(
        x=s_sorted,
        y=ecdf_y,
        mode="lines",
        name="Empirical CDF",
        line=dict(width=1),
    ),
    row=1,
    col=2,
)

fig.update_layout(
    title=f"Noncentral F: dfn={dfn}, dfd={dfd}, nc={nc}",
    width=1100,
    height=450,
)
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_yaxes(title_text="CDF", row=1, col=2)
fig.show()

9) SciPy Integration#

SciPy provides a full-featured implementation: scipy.stats.ncf.

Common methods:

  • pdf, logpdf

  • cdf, sf (survival function), logsf

  • ppf (quantiles)

  • rvs (sampling)

  • stats(moments='mvsk')

  • entropy (numerical)

  • scipy.stats.fit for MLE with bounds/constraints

Note: because ncf has multiple parameters (plus loc/scale), unconstrained fitting can be sensitive.

ncf = stats.ncf

dfn, dfd, nc = 5, 20, 10
x0 = 1.5

print("pdf(x0):", float(ncf.pdf(x0, dfn, dfd, nc)))
print("cdf(x0):", float(ncf.cdf(x0, dfn, dfd, nc)))

s = ncf.rvs(dfn, dfd, nc, size=5, random_state=rng)
print("rvs:", s)

print("mvsk:", ncf.stats(dfn, dfd, nc, moments="mvsk"))
print("entropy:", float(ncf.entropy(dfn, dfd, nc)))

# Fitting example using scipy.stats.fit (global optimization by default)
data = ncf.rvs(dfn, dfd, nc, size=400, random_state=rng)

fit_res = stats.fit(
    ncf,
    data,
    bounds={
        # Fix loc/scale to standard form
        "loc": (0.0, 0.0),
        "scale": (1.0, 1.0),
        # Either estimate or constrain shape parameters
        "dfn": (dfn, dfn),
        "dfd": (dfd, dfd),
        "nc": (0.0, 50.0),
    },
)

print(fit_res)
print("nc_hat:", float(fit_res.params.nc))
pdf(x0): 0.22501760747977675
cdf(x0): 0.14848821017085254
rvs: [2.4224 2.2932 2.5741 1.5943 3.1304]
mvsk: (3.3333333333333335, 4.166666666666667, 1.7667430779693272, 6.412571428571429)
entropy: 1.9476495499863073
  params: FitParams(dfn=5.0, dfd=20.0, nc=9.966537234116151, loc=0.0, scale=1.0)
 success: True
 message: 'Optimization terminated successfully.'
nc_hat: 9.966537234116151

10) Statistical Use Cases#

10.1 Hypothesis testing: power of an F-test#

If an F-statistic has distribution:

  • under \(H_0\): \(F \sim F(\nu_1,\nu_2)\)

  • under \(H_1\): \(F \sim F_{\text{nc}}(\nu_1,\nu_2,\lambda)\)

then for significance level \(\alpha\):

  • critical value \(f_{\text{crit}} = F^{-1}_{\nu_1,\nu_2}(1-\alpha)\)

  • power \(= \mathbb{P}(F > f_{\text{crit}} \mid H_1) = 1 - F_{\text{nc}}(f_{\text{crit}};\nu_1,\nu_2,\lambda)\).

10.2 Bayesian modeling: uncertain effect size#

In Bayesian design / power analysis, you might place a prior on an effect size parameter that implies a prior over \(\lambda\). Then the expected power is an average over that prior.

10.3 Generative modeling#

The noncentral F is a flexible positive distribution for ratios; it can be used as a generative component when the process naturally looks like “signal+noise over noise”. Often, the most useful role is modeling (or simulating) the distribution of a test statistic under plausible alternatives.

# Power curve as a function of the noncentrality parameter

dfn, dfd = 5, 20
alpha = 0.05

f_crit = stats.f.ppf(1 - alpha, dfn, dfd)

nc_grid = np.linspace(0.0, 40.0, 300)
power = stats.ncf.sf(f_crit, dfn, dfd, nc_grid)

fig = go.Figure()
fig.add_trace(go.Scatter(x=nc_grid, y=power, mode="lines", name="power"))
fig.update_layout(
    title=f"Power vs noncentrality (dfn={dfn}, dfd={dfd}, α={alpha}, f_crit={f_crit:.3f})",
    xaxis_title="noncentrality nc",
    yaxis_title="power = P(F > f_crit | H1)",
    width=900,
    height=450,
    yaxis=dict(range=[0, 1]),
)
fig.show()

# Bayesian-flavored: prior over nc -> distribution of power
# (Here we use a simple Gamma prior on nc for illustration.)
prior_nc = rng.gamma(shape=2.0, scale=5.0, size=50_000)  # mean 10
prior_power = stats.ncf.sf(f_crit, dfn, dfd, prior_nc)

print("E_prior[power]:", float(np.mean(prior_power)))
print("90% prior interval for power:", np.quantile(prior_power, [0.05, 0.95]))
E_prior[power]: 0.49195779903889264
90% prior interval for power: [0.121  0.9291]

11) Pitfalls#

  • Invalid parameters: require dfn > 0, dfd > 0, nc >= 0.

  • Moment existence: mean requires dfd > 2, variance dfd > 4, skewness dfd > 6, kurtosis dfd > 8.

  • Numerical issues:

    • The PDF/CDF involve infinite series or special functions; naive truncation can be inaccurate for extreme parameters.

    • Prefer scipy.stats.ncf.logpdf, logcdf, sf, logsf in tail computations.

    • For very large nc, the Poisson mixture may need many terms.

  • Fitting can be unstable:

    • Many parameters (including loc/scale) can lead to identifiability issues.

    • Constrain/fix parameters when you have domain knowledge (e.g., loc=0, scale=1).

12) Summary#

  • ncf is the noncentral analogue of the F distribution and describes many F-statistics under alternatives.

  • It can be defined as a ratio of an independent noncentral chi-square and central chi-square.

  • The PDF/CDF are naturally expressed as a Poisson mixture of central F distributions.

  • Mean/variance have clean closed forms (when dfd is large enough); higher moments require stronger conditions.

  • A simple NumPy-only sampler uses the Poisson-mixture construction.

  • For practical work (tails, fitting, entropy), rely on scipy.stats.ncf and prefer log-domain functions.